Deep learning-based polygenic risk analysis for Alzheimer’s disease prediction

Background The polygenic nature of Alzheimer’s disease (AD) suggests that multiple variants jointly contribute to disease susceptibility. As an individual’s genetic variants are constant throughout life, evaluating the combined effects of multiple disease-associated genetic risks enables reliable AD risk prediction. Because of the complexity of genomic data, current statistical analyses cannot comprehensively capture the polygenic risk of AD, resulting in unsatisfactory disease risk prediction. However, deep learning methods, which capture nonlinearity within high-dimensional genomic data, may enable more accurate disease risk prediction and improve our understanding of AD etiology. Accordingly, we developed deep learning neural network models for modeling AD polygenic risk. Methods We constructed neural network models to model AD polygenic risk and compared them with the widely used weighted polygenic risk score and lasso models. We conducted robust linear regression analysis to investigate the relationship between the AD polygenic risk derived from deep learning methods and AD endophenotypes (i.e., plasma biomarkers and individual cognitive performance). We stratified individuals by applying unsupervised clustering to the outputs from the hidden layers of the neural network model. Results The deep learning models outperform other statistical models for modeling AD risk. Moreover, the polygenic risk derived from the deep learning models enables the identification of disease-associated biological pathways and the stratification of individuals according to distinct pathological mechanisms. Conclusion Our results suggest that deep learning methods are effective for modeling the genetic risks of AD and other diseases, classifying disease risks, and uncovering disease mechanisms.


Alzheimer's Disease Neuroimaging Initiative cohort
We obtained genotype and phenotype data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (https://adni.loni.usc.edu/). The ADNI, led by Michael W. Weiner, MD, was launched in 2003 as a public-private partnership. It is a fourstage study that aims to examine the brain's structure and function, aided by biomarker and clinical data in people aged 55-90 years from the United States and Canada. For the present study, we included array genotype data obtained from ADNI-1, ADNI-2/GO, and ADNI-3 for analysis. After prefiltering, imputation, and postfiltering, we retained 1,382 subjects (n = 689 patients with Alzheimer's disease [AD] and 693 cognitively normal controls [NCs]) for downstream analysis. The phenotypes of the ADNI subjects are from the subjects' latest diagnostic records (updated January 2021).

National Institute on Aging Alzheimer's Disease Centers cohort
The clinical and neuropathology cores of the 29 National Institute on Aging (NIA)funded Alzheimer's Disease Centers (ADCs) recruited and evaluated autopsy-confirmed and clinically confirmed patients with AD as well as cognitively normal elderly subjects. We retrieved the genotype and phenotype data of this AD cohort (n = 6,065) from the National Institutes of Health (NIH) database of Genotypes and Phenotypes (dbGaP) (accession number: phs000372.v1.p1). Genotype information was generated from the Illumina Human660W-Quad BeadChip or HumanOmniExpress Array. All autopsied subjects were ≥60 years old at death. Dementia in AD was determined according to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria or a Clinical Dementia Rating ≥1. Further details can be found in publications arising from the corresponding dbGaP project 1,2 . After prefiltering, imputation, and postfiltering, we retained 5,692 subjects (3,946 patients with AD and 1,746 cognitively NCs) for downstream analysis.

Late Onset Alzheimer's Disease Family Study cohort
The Late Onset Alzheimer's Disease (LOAD) Family Study recruited families with two or more siblings with late-onset AD as well as age-and ethnicity-matched, unrelated, nondemented controls. Patients with definite AD were diagnosed according to established neuropathological criteria (i.e., CERAD, Braak, Khachaturian, NIA-RI, or other established criteria). Probable AD or possible AD was ascertained according to the NINCDS-ADRDA (National Institute of Neurological and Communicative Diseases and Stroke/Alzheimer's Disease and Related Disorders Association) criteria. We recruited patients with AD with an age of onset or age at diagnosis ≥50 years old and NCs ≥50 years old. We retrieved genotype and phenotype data from the NIH dbGaP (accession number: phs000168.v2.p2). Individual genotypes were generated from the Human 610Quadv1_B Beadchips (Illumina). After prefiltering, imputation, and postfiltering, we retained 4,278 subjects (n = 2,046 patients with AD and 2,232 NCs) for downstream analysis. Further details can be found in a publication arising from the corresponding dbGaP project 3 .

Filtering and imputation for the array datasets
We converted array genotype information from the ADNI, LOAD, and ADC datasets from PLINK to VCF format using vcfCooker (v1.1.1; https://genome.sph.umich.edu/wiki/VcfCooker). We applied prefiltering at both the individual and variant levels, retaining individuals with a sample call rate ≥95% and variants with a genotype call rate ≥80% separately for each dataset. We submitted the filtered genotype data to the TOPMed Imputation Server 4 (https://imputation.biodatacatalyst.nhlbi.nih.gov) using the TOPMed Imputation Reference panel (TOPMed R2) 5 for phasing and imputation in the form of chromosome-separated VCF files generated by Eagle (v2.4) 6 . We performed postimputation filtering by removing imputed variants with an imputation r 2 < 0.4 using the bcftools filter function. We further annotated the dbSNP ID (v154) using the bcftools annotate function. We retained single nucleotide polymorphisms with matched dbSNP IDs and alleles for subsequent polygenic score analysis. For part of the quality control analysis, we used PLINK to estimate the identity-by-descent (IBD) using variants with a minor allele frequency (MAF) > 0.01 and pruning according to an R 2 of 0.2, and excluded potentially duplicated subjects according to an IBD > 0.90.

Supplementary Figure 1. Performance of the different weighted polygenic risk score models for disease classification accuracy in the European-descent cohorts
(a) Performance of the different weighted polygenic risk score models for disease classification accuracy in the European-descent cohorts using different variants sets selected by different pvalue thresholds. (b) Comparison of the different models for disease classification accuracy. The optimal condition for each model was included for the plotting. Data are shown as means with 95% confidence intervals. auROC, area under the receiver operating characteristic curve; LD, linkage disequilibrium; wPRS, weighted polygenic risk score.

Supplementary Figure 2. Performance of different prediction models for disease classification accuracy in the European-descent cohorts without validation
(a) Performance of the different prediction models for disease classification accuracy in the European-descent cohorts without validation. (b) Comparison of the different models for disease classification accuracy. auROC, area under the receiver operating characteristic curve; LD, linkage disequilibrium; wPRS, weighted polygenic risk score. Data are means with 95% confidence intervals. One-way ANOVA followed by Bonferroni's post hoc test: ***p < 0.001. Lasso, least absolute shrinkage and selection operator; wPRS, weighted polygenic risk score.

Supplementary Figure 3. Performance of different prediction models for disease classification accuracy in the European-descent cohorts using the five-fold crossvalidation method
(a) Performance of the different prediction models for disease classification accuracy in the European-descent cohorts using the five-fold cross-validation method. (b) Comparison of the different models for disease classification accuracy. Data are means with 95% confidence intervals. One-way ANOVA followed by Bonferroni's post hoc test: ***p < 0.001 (n = 10 data points per category). auROC, area under the receiver operating characteristic curve; LD, linkage disequilibrium; lasso, least absolute shrinkage and selection operator; wPRS, weighted polygenic risk score.

Supplementary Figure 4. Optimization of the neural network model for classifying Alzheimer's disease risk using an independent cohort for cross-validation
(a-h) Model performance of the neural network model during the model training. ADC, National Institute on Aging Alzheimer's Disease Centers cohort; LOAD, Late Onset Alzheimer's Disease Family Study cohort; WGS, whole-genome sequencing.

Supplementary Figure 5. Performance of different polygenic score models in the European-descent Alzheimer's disease cohorts
Comparisons of the auROCs and auPRCs of AD cohorts obtained by different models. For auROCs, data are shown as means with 95% confidence intervals. Bootstrap one-tailed test: *p < 0.05, **p < 0.01, ***p < 0.001. For auPRCs, data are shown as means. AD, Alzheimer's disease; ADC, National Institute on Aging Alzheimer's Disease Centers cohort; ADNI, Alzheimer's Disease Neuroimaging Initiative cohort; auPRC, area under the precision-recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; LOAD, Late Onset Alzheimer's Disease Family Study cohort; NN, neural network; wPRS, weighted polygenic risk score using results from Jansen's summary statistics; wPRS2, a parallel weighted polygenic risk score analysis using results from IGAP 2019 summary statistics.

Supplementary Figure 6. Performance of different prediction models for disease classification accuracy in the European-descent cohorts stratified by ethnic group
(a) Performance of the different prediction models for disease classification accuracy in the European-descent cohorts stratified by ethnic group. (b) Comparison of the different models for disease classification accuracy. Data are means with 95% confidence intervals. One-way ANOVA followed by Bonferroni's post hoc test: *p < 0.05, ***p < 0.001. (c) Visualization of the polygenic risk score distribution stratified by phenotype and ethnic group. AD, Alzheimer's disease; auPRC, area under the precision-recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; NC, normal control; wPRS, weighted polygenic risk score.

Supplementary Figure 7. Performance of different prediction models for disease classification accuracy in the European-descent population stratified by sex
(a) Performance of the different prediction models for disease classification accuracy in the European-descent population stratified by sex. (b) Comparison of the different models for disease classification accuracy, stratified by sex. Data are means with 95% confidence intervals. auPRC, area under the precision-recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; wPRS, weighted polygenic risk score.

Supplementary Figure 8. Performance of different prediction models for disease classification accuracy in the European-descent population stratified by age group
(a) Performance of the different prediction models for disease classification accuracy in the European-descent population stratified by age group. (b) Comparison of the different models for disease classification accuracy, stratified by age group. Data are means with 95% confidence intervals. One-way ANOVA followed by Bonferroni's post hoc test: *p < 0.05, **p < 0.01, ***p < 0.001. auPRC, area under the precision-recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; wPRS, weighted polygenic risk score. , and results from WGS1 cohort (red). Colors in right panel denote different reference datasets used to construct the models: European-descent datasets (i.e. the ADC, LOAD, and ADNI datasets; blue), and WGS1 cohort (red). Data are means with 95% confidence intervals. One-way ANOVA followed by Bonferroni's post hoc test: **p < 0.01, ***p < 0.001. ADC, National Institute on Aging Alzheimer's Disease Centers cohort; ADNI, Alzheimer's Disease Neuroimaging Initiative cohort; auPRC, area under the precision-recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; LOAD, Late Onset Alzheimer's Disease Family Study cohort; p, p-value; WGS, whole-genome sequencing; WGS, whole-genome sequencing; WGS1, Chinese WGS cohort 1; WGS2, Chinese WGS cohort 2; wPRS, weighted polygenic risk score. Figure 10. Genomic correlations among the polygenic risk scores obtained from the trans-ethnic prediction models in Chinese WGS cohort 1 R 2 was calculated using Spearman's rank correlation test. lasso, least absolute shrinkage and selection operator; NN, neural network; p, p-value; WGS, whole-genome sequencing; WGS1, Chinese WGS cohort 1; wPRS, weighted polygenic risk score. Figure 11. Performance of polygenic risk models for classifying Alzheimer's disease risk in the Chinese Alzheimer's disease whole-genome sequencing cohorts (a-b) ROC curves for model performance in classifying (a) AD and (b) MCI patients in Chinese WGS cohort 1. (c-f) Model performance for classifying AD and MCI patients in Chinese WGS cohorts 1 and 2. For auROC, the y-axis shows the mean auROC, with error bars representing 95% confidence intervals. The data were analyzed using a bootstrap two-tailed test: *p < 0.1, **p < 0.01, ***p < 0.001. For auPRC, y-axis shows the mean auPRC. AD, Alzheimer's disease; auPRC, area under the precision-recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; Lasso_APOE, lasso model constructed using variants in APOE regions; lasso_nonAPOE, lasso model constructed using variants outside of APOE regions; MCI, mild cognitive impairment; NC, normal control; NN, neural network; WGS, whole-genome sequencing; WGS1, Chinese WGS cohort 1; WGS2, Chinese WGS cohort 2; wPRS, weighted polygenic risk score.

Supplementary Figure 13. Evaluation of different prediction models using 37 variants for disease classification accuracy in European-descent cohorts and Chinese WGS cohort 1 using the five-fold cross-validation method
(a) Performance of the different prediction models using 37 variants for disease classification accuracy in European-descent cohorts and the WGS1 dataset using the five-fold crossvalidation method. (b) Comparison of the different models for disease classification accuracy. Data are means with 95% confidence intervals. One-way ANOVA followed by Bonferroni's post hoc test: **p < 0.01, ***p < 0.001 (n = 10 data points per category). auPRC, area under the precision-recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; WGS, whole-genome sequencing; WGS1, Chinese WGS cohort 1; wPRS, weighted polygenic risk score.

Supplementary Figure 14. Comparison between models using 37 variants and variants selected by p-value thresholds for classifying Alzheimer's disease risk in the Europeandescent cohorts using the five-fold cross-validation method
Data are means with 95% confidence intervals. One-way ANOVA followed by Bonferroni's post hoc test: ***p < 0.001 (n = 10 data points per category). auROC, area under the receiver operating characteristic curve; p, p-value; wPRS, weighted polygenic risk score.

Supplementary Figure 15. Comparison between models using 37 variants and variants selected by p-value thresholds for classifying Alzheimer's disease risk in the Chinese population
Data are means with 95% confidence interval. Two-way ANOVA followed by Benjamini--Hochberg's post hoc test comparing results from 37 variants and other groups: *p < 0.05, ***p < 0.001. auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; p, p-value; wPRS, weighted polygenic risk score.

Supplementary Figure 16. Performance of trans-ethnic prediction models using 37 variants for disease classification accuracy in the European-descent cohorts and WGS1 dataset
(a) Performance of trans-ethnic prediction models using 37 variants for disease classification accuracy in the European-descent cohorts and WGS1 dataset. (b) Comparison of the different models for disease classification accuracy in different ethnic groups. Data are means with 95% confidence intervals. One-way ANOVA followed by Bonferroni's post hoc test: *p < 0.05, **p < 0.01, ***p < 0.001. auPRC, area under the precision-recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; WGS, whole-genome sequencing; WGS1, Chinese WGS cohort 1; WGS2, Chinese WGS cohort 2; wPRS, weighted polygenic risk score.

Supplementary Figure 17. Classification of Alzheimer's disease in the Chinese population using neural network models with different variant sets
Dot plots show the classification accuracy of neural network models constructed with five-fold cross-validation based on the following: all 216 AD variants reported by genome-wide association studies, 10 sets of 37 variants randomly selected from those 216 variants ("Randomly selected"), and the 37 AD variants that showed significant associations in the Chinese population ("AD-associated"). Data are means ± SEM. One-way ANOVA followed by Tukey's post hoc test comparing (a) auROC and (b) auPRC with all other variants groups: ***p < 0.001. AD, Alzheimer's disease; auPRC, area under the precision-recall curve; auROC, area under the receiver characteristics curve; SEM, standard error of the mean. Comparison of different models with respect to disease classification accuracy. Data are means with 95% confidence intervals. One-way ANOVA followed by Bonferroni's post hoc test: ***p < 0.001. auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; GNN, graph neural network; UTR, untranslated region; wPRS, weighted polygenic risk score. Figure 1) Cohorts for polygenic score model testing (N = 11,352 Table 3. Performance of the weighted polygenic risk score models for disease classification accuracy in the European-descent cohorts (for Figure 1) Classification accuracy was measured as auROC. auROC, area under the receiver operating characteristic curve; LD, linkage disequilibrium.

Supplementary Table 4. Performance of the modified weighted polygenic risk score models for disease classification accuracy in the European-descent cohorts (for Supplementary Figure 1)
Classification accuracy was measured as auROC. auROC, area under the receiver operating characteristic curve; LD, linkage disequilibrium; N/A, not applicable.

Supplementary Table 5. Evaluation of different prediction models for disease classification accuracy in the European-descent cohorts without validation (for Supplementary Figure 2)
Classification accuracy was measured as auROC. auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; LD, linkage disequilibrium.

Supplementary Table 6. Evaluation of different prediction models for disease classification accuracy in the European-descent cohorts using the five-fold crossvalidation method (for Supplementary Figure 3)
Classification accuracy was measured as auROC. auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; LD, linkage disequilibrium.

Supplementary Table 7. Evaluation of different prediction models for disease classification accuracy in independent European-descent cohorts (for Figures 2a-d, Supplementary Figure 4)
Classification accuracy was measured as auROC or auPRC. AD, Alzheimer's disease; ADC, National Institute on Aging Alzheimer's Disease Centers cohort; ADNI, Alzheimer's Disease Neuroimaging Initiative cohort; auPRC, area under the precision-recall curve; auROC, area under the receiver operating characteristic curve; CI, confidence interval; lasso, least absolute      0.8158 0.8780 0.8968 auPRC, area under the precision-recall curve; auROC, area under the receiver operating characteristic curve; lasso, least absolute shrinkage and selection operator; n, number of sites; wPRS, weighted polygenic risk score. Figures 9, 10) Lasso, least absolute shrinkage and selection operator; LD, linkage disequilibrium; WGS, whole-genome sequencing; WGS1, Chinese WGS cohort 1.  Transcript abundance is indicated as the FPKM values obtained from the RNA-sequencing data. FPKM, fragments per kilobase per million mapped fragments.